Identifying influential spreaders in complex networks for disease spread and control

Identifying influential spreaders is an important task in controlling the spread of information and epidemic diseases in complex networks. Many recent studies have indicated that the identification of influential spreaders is dependent on the spreading dynamics. Finding a general optimal order of node importance ranking is difficult because of the complexity of network structures and the physical background of dynamics. In this paper, we use four metrics, namely, betweenness, degree, H-index, and coreness, to measure the central attributes of nodes for constructing the disease spreading models and target immunization strategies. Numerical simulations show that spreading processes based on betweenness centrality lead to the widest range of propagation and the smallest epidemic threshold for all six networks (including four real networks and two BA scale-free networks generated according to Barabasi–Albert algorithm). The target immunization strategy based on the betweenness centrality of nodes is the most effective for BA scale-free networks but displays poor immune effect for real networks in identifying the most important spreaders for disease control. The immunization strategy based on node degrees is the most effective for the four real networks. Findings show that the target immune strategy based on the betweenness centrality of nodes works best for standard scale-free networks, whereas that based on node degrees works best for other nonstandard scale-free networks. The results can provide insights into understanding the different metrics of measuring node importance in disease transmission and control.

www.nature.com/scientificreports/ capabilities of the nodes were investigated 15 , and the generalized reachability is more related to the propagation process in spatial networks than that in nonspatial networks. Many studies of disease transmission in complex networks mainly use mean-field method 16,17 , which is simple and easy to understand. The mean-field method is widely used in the analysis and prediction of epidemiology propagation dynamics [18][19][20] , voting model 21 , and synchronization phenomenon 22,23 . The mean-field method is a method of collectively treating the effects at the level of classes. Assume that the nodes of the same degree behave equally. For homogeneous 24 and uniform networks (degree distribution obeys Normal or Poisson distribution), scholars applied the homogeneous mean-field method to model analysis. For heterogeneous Barabási-Albert (BA) networks 25 , such as scale-free networks, scholars used heterogeneous mean-field methods by assuming that all nodes with equal degree k follow the same dynamic process. Many mean-field methods are used to model the spreading process on the basis of the node degree distribution, and the attributes used to characterize the importance of nodes include the metrics of node degree and the metrics of betweenness, H-index, coreness 26,27 . The relationship between degree, H-index, and coreness was displayed in 26 , where degree, H-index, and coreness are the initial, intermediate, and steady states of these sequences, respectively. Since the mean-field method used to describe propagation dynamics assumes that nodes with the same degree have the same dynamics, can the mean-field method be constructed and assume the nodes with the same H-index or coreness or betweenness have the same dynamics?
Although many studies have provided evidence for the presence of influential spreaders in disease transmission, the conclusions are not universal. No consensus is reached under the definition of network centrality because different metrics are used to quantify the centrality of nodes for specific physical background 28 . Controlling the spread of infectious diseases through complex networks, such as the spread of rumors in social interaction and the spread of infection in the population, has attracted increasing attention. Sk et al. proposed a novel mathematical model to forecast COVID-19 and assess control strategies, they found that investing on the quarantined was more effective than isolated people in reducing cases 29 . Li et al. developed a SEIQR differenceequation model to investigate the dynamics of COVID-19. The advantage of this model is that it does not need to estimate the initial value of the model 30 . From the statistical and mathematical analysis of of COVID-19 case reports, human mobility and temporal and spatial changes of transmission control measures, they concluded that China's control measures made it possible to avoid hundreds of thousands of cases 31 . Considering the impact of blockade and medical resources on the spread of COVID-19 in Wuhan, Sun et al. put forward a dynamic model of epidemic transmission and found that the more abundant medical resources, the smaller the final scale 32 . Immunizing a small number of the highest degree nodes can effectively eliminate virus spreading in scale-free networks 33 . An improved mean-field model to study immunization was proposed by utilizing the degree centrality before and after the immunization 34 , showing that the epidemic threshold for infectious diseases is low. A new measure of centrality that utilizes the coreness and eigenvector centralities was proposed to identify the influential spreaders 35 . The results displayed that this approach is more influential than other benchmarks. Many metrics of node centrality are used to select the nodes for target immunization.
In this study, heterogeneous mean-field propagation models based on node betweenness centrality, degree, H-index, and coreness are constructed. The numerical simulations show that the spreading processes based on betweenness centrality lead to the widest range of propagation processes. With the target immunity for disease transmission, the numerical simulations of uniform and target immunizations based on node betweenness centrality, degree, H-index, and coreness display that the immunization strategy based on betweenness centrality performs competitively for BA scale-free networks, and the immunization strategy based on degree centrality performs competitively in comparison with other strategies for real networks.

Methods
Epidemic model construction strategies. The classical susceptible-infected-susceptible (SIS) model 16 is used to describe the spreading dynamics on heterogeneous network at the level of classes of nodes. Assume that all nodes of the same degree (betweenness, H-index, and k-coreness) behave equally. Define the partial prevalence ρ k (t) ( ρ b (t) , ρ h (t) and ρ s (t) ) as the fraction of infected nodes with a given degree k degree(betweenness b, H-index h, and coreness s). The goal is to understand the propagation process under the measurement of four different node importance metrics on the epidemic processes. We compare the prevalence of the four different propagation dynamics. In the heterogeneous network, let P(k) denote the fraction of infected nodes that a node with a given degree k and P(k ′ |k) be the conditional probability that a node of degree k is connected to a node of degree k ′ . The normalization conditions k P(k) = 1 and k P(k ′ |k) = 1 hold. The average number of links connecting a node of degree k to some nodes of degree k ′ is kP(k ′ |k) . Thus, the dynamic evolution processes can be written as where The first term on the right-hand side denotes that infected nodes of degree k can be recovered. The second term indicates that susceptible nodes of degree k are infected by their infected neighbors, where 1 − ρ k (t) represents the fraction of susceptible nodes of degree k, and � k (t) is the probability that a link emanating from the www.nature.com/scientificreports/ nodes of degree k points to an infected node in the complex network. k is the first moment of degree distributions. When the right-hand side of Eq. (1) is zero, we obtain where k 2 is the second moment of degree distributions. The epidemic threshold of heterogeneous network based on degree is 16 For the metrics of H-index, coreness, and betweenness centrality, the dynamic processes are constructed as follows: , s is the metric for the nodes of coreness s.
represents the number of shortest paths through node i. Node betweenness is defined as the ratio of the number of shortest paths through node i to the total number of shortest paths in the network. We use the number of shortest paths through node i as metrics for modeling. For the metrics of H-index, coreness, and betweenness centrality, by the same methods, we can conclude that the models have the following epidemic thresholds: where h is the first moment and h 2 is the second moment of H-index distributions.
where s is the first moment and s 2 is the second moment of coreness distributions.
where b is the first moment and b 2 is the second moment of betweenness distributions.
Optimized immunization strategies. For epidemic spreading in complex networks, the immunization procedure can effectively prevent the spread of disease. Given that the proportional immunization schemes can effectively increase the immunity threshold, the special network of scale-free networks allows for efficient strategies on the basis of the hierarchy of nodes. Scholars have designed targeted immunization schemes where the highly connected nodes (i.e., nodes are likely to spread disease) are gradually immunized. Choosing the most effective metric to identify the most efficient spreaders for targeted immunization in a network is an important step. In contrast to common belief, the best spreaders correspond to the most highly connected or the most central people. In this study, four metrics (betweenness, degree, H-index, and coreness) are used to identify the most www.nature.com/scientificreports/ efficient spreaders for targeted immunization. For a fixed infection rate , the control parameter is the immunity proportion g, which is defined as part of the immune nodes that exist in the network. For the mean-field model, the presence of immunity can reduce the infection rate by the factor (1 − g) . By substituting → (1 − g) in Eq. (3), the prevalence behavior of immunization rates is increasing. Consider that a proportion g of the nodes with the highest connectivity are immunized. This process corresponds to introducing an upper threshold k t , such that all nodes with connectivity k > k t are immunized. The proportion of immunized nodes is given by Eq.
(3) at the mean-field level, and the presence of immunity will reduce the infection rate by the factor.
For the metrics of H-index, coreness, and betweenness, the fraction g of the immunized nodes is given as follows: Data description. Four real networks and two BA scale-free networks from different fields are used to simulate the performance of the four models in Eqs. (3), (8), (9) and (10), including two communication networks (Email, ego-Facebook), one social networks (Political blogs), one transportation network (USAir), and two BA scale-free networks. Email 36,37 describes the email interchanges between various users. Political blogs 38,39 is a network between weblogs on US politics. ego-Facebook 40,41 is collected from survey participants using the Facebook app. USAir 42 is a network that describes US air transportation. Two scale-free networks are formed using the algorithm proposed by Barabási and Albert 25 with different parameters. The topological features of these networks, including the number of nodes and links, average degree, average distance, and assortative coefficient, are shown in Table 1.
The features of these networks, including the max betweenness, max degree, max H-index, and max coreness, are shown in Table 2. For each network, the max betweenness of nodes is the largest, followed by max degree, max H-index, and max coreness. Therefore, the node importance fluctuations based on betweenness are the largest, followed by those fluctuations based on degree, H-index, and coreness. This finding implies that the most fluctuant metric is betweenness, followed by degree, H-index, and coreness.

Results
Numerical simulations for epidemic model. To study the present mean-field methods based on the four metrics, the numerical simulations of the SIS model on the six networks are used to display the spread dynamic process. The infection prevalence in the stationary state as a function of spreading rate was simulated. The initial fraction of infected nodes is set to 0.05 (without lack of generality, we set the fixed probability of curing µ = 1 ). The comparisons of infection prevalence ρ of the infected nodes among the four models and Monte Carlo simulations for six networks are displayed in Figs. 1, 2 and 3. Different disease spreading models exhibit diverse spreading processes. The model based on betweenness centrality exhibits the highest infection       Comparing the epidemic thresholds of epidemic models. Since more fluctuant metric lead to larger second moment, one can obtain: which imply that the model based on betweenness centrality exhibits the smallest epidemic threshold for all six networks, followed by the model based on degree, H-index, and coreness. Therefore, we can conclude that: The comparison of the epidemic thresholds for the networks from theoretical analysis described in Eqs. (7), (11), (12), and (13) with that from numerical simulations are displayed in the Figs. 4, 5 and 6. It shows that the theoretical analysis agrees well with numerical simulations with some minor deviation. Disease spreading models based on different metrics exhibit diverse spreading processes. Moreover, the simulation results are consistent with that of the previous analysis in Eq. (17).

Comparing immunization effect of different immunization strategies. The immunization behav-
iors for the SIS model on six networks are studied through numerical simulations for different immunization schemes. The targeted immunization scheme and uniform immunization are tested by immunizing the gN nodes. For a network of fixed size N, the uniform immunization is conducted by randomly selecting and immunizing gN nodes, and the targeted immunizations are performed by choosing the most central gN nodes to immunize. Node centrality is identified from betweenness, degree, H-index, and coreness centralities. The nodes at the top of the lists will be maintained. Infection prevalence ρ g in the stationary state as a function of    www.nature.com/scientificreports/ different immunization proportion g is obtained through numerical simulation. The initial fraction of infected nodes is set to 0.05, the fixed spreading rate is = 0.15 , and the fixed probability of curing is µ = 1 . The infection prevalence is computed by averaging over 100 runs of each model. Figures 7, 8 and 9 display the behavior of infection prevalence ρ g /ρ 0 (where ρ 0 is the prevalence without immunization) as a function of immunization proportion g. For uniform immunization, the prevalence experiences a passive drop and exhibits the onset of   www.nature.com/scientificreports/ large immunization threshold. On the contrary, the infection prevalence experiences an extremely sharp drop and exhibits the onset of small immunization threshold for targeted immunization (immunization threshold denotes that ρ g /ρ 0 is 0 in Figs. 7, 8 and 9). The targeted immunization strategy using betweenness centrality to identify the most important nodes displays different immune effect for real networks, it displays the second best effect for USAir and email-Eu-core networks, as shown in Fig. 7, and displays the worst target immune effect among the four strategies, as shown in Fig. 8 for Political blogs and ego-Facebook networks. But it displays the best target immune effect for the two BA scale-free networks in Fig. 9. Excluding the target immunity based on betweenness centrality. For all networks, the target immunity strategy using degree centrality shows the most effective immune effect and obtains the smallest immunization threshold, followed by the model using H-index centrality, and the third is the model using coreness centrality.

Discussion
The models for disease propagation based on betweenness exhibit the best propagation effect, followed by those models using the metrics of degree, H-index, and coreness. The models for targeted immunization based on betweenness exhibit the best immunization effect on BA scale-free networks, followed by those models using degree, H-index, and coreness. The targeted immunizations based on betweenness exhibit inconsistent effect for real networks because the node degrees of these real networks do not conform to the standard power-law distribution. The node degree distributions of these real networks and two BA scale-free networks are displayed in Figs. 10, 11 and 12. The right panels of Figs. 8 and 11 display that ego-Facebook whose node degrees are the least consistent with the power-law distribution shows the worst target immunity effect, and Figs. 9 and 12 show that the two BA scale-free networks whose node degrees are the most consistent with the power-law distribution show the best target immunity effect. The target immunization based on the node betweenness exhibits the best immunization effect on the network with standard power-law distribution (that is, the node degree distribution is a straight line in the double logarithmic scales). The models for disease propagation and targeted immunization based on degree exhibit the best propagation and immunization effects, followed by those models using H-index and coreness when the optimal of point betweenness is excluded. The key to obtaining these conclusions is the heterogeneity of node centrality. The greater the centrality disturbance of the nodes is, the more favorable it is  Figure 9. Comparison of the prevalence ρ g /ρ 0 as a function of immunity proportion g with uniform and four targeted immunization for scale-free 1 (left) and scale-free 2 (right), at a fixed spreading rate = 0.15. The simulations confirm that the most fluctuant metric is effective in disease propagation and identifying the spreading influences of nodes for target immunization. These results reveal that the largest fluctuant metric can be used to construct the fastest disease spreading model and identify the influential spreaders for target immunization. For networks with standard power law distribution, the best immune effect can be obtained by finding the most influential spreaders using the betweenness centrality as the immune target. However, for many real networks with nonstandard power law distribution, the best immune effect can be obtained using the degree centrality to find the most influential spreaders as the immune target. The 2019 Coronavirus Disease (COVID-19) has caused an outbreak on a global scale, so it is necessary to investigate control strategies to develop health care plans. The findings of this paper can guide people to accurately find the targets that need immunization, so as to effectively control the spread of the epidemic.

Conclusion
The models for disease propagation and target immunization strategies based on node betweenness, degree, H-index, and coreness metrics are proposed in this study. The models for disease propagation based on betweenness always exhibit the best propagation effect, followed by those using the metrics of degree, H-index, and coreness. The models for targeted immunization based on betweenness exhibit the best immunization effect in BA scale-free networks, followed by those using degree, H-index, and coreness. However, the models for targeted immunization in four real networks based on node degrees exhibit the best immunization effect, while that based on betweenness for different real networks exhibit inconsistent effects (second-best or worst effects) because the node degrees of these real networks do not conform to the standard power-law distribution. The results provide reference for the public health sector to put forward effective measures for disease prevention and control.  www.nature.com/scientificreports/

Data availability
All relevant data can be downloaded from their respective web pages.